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We introduce an auxiliary technique, called residual nudging, to the particle 
filter to enhance its performance in cases that it performs poorly. The main idea 
of residual nudging is to monitor, and if necessary, adjust the residual norm of a 
state estimate in the observation space so that it does not exceed a pre-specified 
threshold. We suggest a rule to choose the pre-specified threshold, and construct 
a state estimate accordingly to achieve this objective. Numerical experiments 
suggest that introducing residual nudging to a particle filter may (substantially) 
improve its performance, in terms of filter accuracy and/or stability against 
divergence, especially when the particle filter is implemented with a relatively 
small number of particles. Copyright © 2012 Royal Meteorological Society 
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1. Introduction methods that are employed to tackle the problem. The 

EnKF and the PF provide approximations to the optimal 
State estimation often arises in geosciences studies. so i ution obtained in the framework of recursive Bayesian 
Recursive Bayesian filtering approaches, including the estimation (RBE, see, for example, Arulampalam et al. 
ensemble Kalman filter (EnKF, see, for examples, Anderson 2 002, or Section 2). The EnKF approximates the prior and 
2001; Bishop et al. 2001; Burgers et al. 1998; Hoteit posterior probability density functions (pdfs) of the model 
et al. 2002; Whitaker and Hamill 2002) and the particle state by some Gaussian ones, which appears insufficient 
filter (PF, see, for examples, Pham 2001; Van Leeuwen 
2003, 2010), are among the most popular data assimilation 
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when the distribution of the model state is multi-modal*. 
In contrast, the PF approximates the prior and posterior 
pdfs of the model state by mixture models of Dirac delta 
functions (i.e., Monte Carlo approximation), in which the 
mass points are particles drawn from certain pdfs. As the 
number of particles increases, the mixture models of Dirac 
delta functions approach the targeted pdfs asymptotically, 
hence the solution of the PF converges to the optimal one 
in the framework of RBE (Doucet et al. 2001, ch. 2). The 
asymptotic convergence of the PF is achieved regardless of 
the presence of nonlinearity and non-Gaussianity in data 
assimilation. 

A well-known problem in applying the PF is the phe- 
nomenon of weight collapse, also known as weight degen- 
eracy or impoverishment (cf., for examples, Arulampalam 
et al. 2002; Bengtsson et al. 2008; Gordon et al. 1993; 
Snyder et al. 2008), in which the weight of a particular 
particle approaches one, and those associated with the 
remaining particles collapse to zero. In such circumstance, 
the effective sample size of the particle filter becomes very 
small, which often deteriorates the performance of the filter. 

In the literature two strategies are often employed to 
tackle the problem of weight collapse (Arulampalam et al. 
2002). One is to introduce a re-sampling step to the particle 
filter when the effective sample size is below a certain 
threshold. With the re-sampling step, a new set of particles 
is generated with identical weights. Many implementations 
of the particle filter differ from each other mainly at the 
re-sampling step, which, however, is a topic beyond the 
scope of this work. Readers are referred to, for examples, 
Arulampalam et al. (2002); Van Leeuwen (2009), for more 
information. A potential problem with the re-sampling 
strategy alone is that in certain circumstances, in order to 
avoid weight collapse, the number of particles may have to 
scale exponentially with the dimension of the model state 
(Bengtsson et al. 2008; Snyder et al. 2008). This implies 

* In such circumstances, it is more appropriate to use a mixture of Gaussian 
pdfs to approximate the distribution of the model state, see, for examples, 
Anderson and Anderson (1999); Bengtsson et al. (2003); Hoteit et al. 
(2008, 2012); Luo et al. (2010b). 
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that the PF may become prohibitively expensive for data 
assimilation in high-dimensional systems. 

Another strategy is to choose a good importance 
(or proposal) density from which the particles are 
drawn (Arulampalam et al. 2002; Bocquet et al. 2010; 
Van Leeuwen 2010). For instance, one may adopt an 
"optimal" importance density in the sense that, for a given 
particle at the current assimilation cycle, the weights of 
the samples drawn from the optimal importance density at 
the next assimilation cycle will be identical, regardless of 
the locations of the drawn samples (Arulampalam et al. 
2002, Eq. (53)). In a recent work, Bocquet et al. (2010) 
show that, in the 40-dimensional Lorenz 95 model (Lorenz 
and Emanuel 1998, L95 hereafter), the particle filter 
equipped with the optimal importance density (in many 
cases substantially) outperforms the conventional bootstrap 
particle filter (Gordon et al. 1993) when the sample size 
is no larger than 10000. A similar idea is also explored in 
Van Leeuwen (2010); Ades and van Leeuwen (2012). There 
the authors adopt an importance density through which the 
generated particles are equipped with almost equal weights. 
By a proper design of the importance density, the PF with 
only 20 particles can achieve an estimation accuracy that 
is comparable to that of the conventional methods with 
thousands of particles (Van Leeuwen 2010). 

Apart from weight degeneracy, another factor that also 
influences the practical performance of the PF is the slow 
convergence rate of Monte Carlo approximation. After all, 
in many real-world problems, one can only afford to run 
finitely many - often a small number of - particles with 
the timing and computational cost limitations. In such 
circumstances, the slow convergence rate of Monte Carlo 
approximation implies that a PF solution with only finitely 
many particles may not be able to converge sufficiently 
close to the optimal one, and that, in this specific context, it 
may become an unrealistic objective for one to achieve the 
asymptotic optimality of the PF. A certain gap might arise 
between the optimal solution and the approximate one of the 
PF, especially when the sample size of the PF is relatively 
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small. In this regard, introducing a re-sampling step to the 
PF alone may not be sufficient to address the effect of finite 
sample size. Instead, one may opt to seek some auxiliary 
technique to enhance the performance of the PF with a finite 
sample size, which is the focus of this study. 

In this work we consider one possible auxiliary 
technique, called residual nudging, which aims to provide 
certain compensation to the PF solution when the filter 
does not perform well. Here a "residual" is a vector in 
the observation space, and is defined as the projection 
of a state estimate onto the observation space subtracted 
by the corresponding observation. In residual nudging our 
objective is to make the (weighted) vector norm of the 
residual ("residual norm" for short) no larger than a pre- 
specified value. This is motivated by the observation that, 
if the residual norm is too large, then the corresponding 
state estimate is often a poor one. In such cases, it is better 
off to choose as the new estimate a state vector whose 
residual norm is smaller. In this sense, residual nudging can 
be considered as a safeguard strategy that helps a poorly- 
performing filter to perform less poorly by providing certain 
compensation to the original mean estimate of the PF. It, 
however, may not in its own right reduce the sample size 
requirement in order for the PF to obtain a reasonable 
approximate solution in data assimilation. Likewise, it 
neither solves the weight collapse (or degeneracy) problem 
in the PF. 

This study is organized as follows. Section 2 introduces 
the problem of our interest and presents the recursive 
Bayesian estimation as the conceptual solution. Section 3 
reviews the main steps in the particle filter, introduces the 
concept of residual nudging, and discusses how residual 
nudging can be implemented in a particle filter. Section 4 
examines and compares the performance of the regularized 
particle filter (as a representative of the various particle 
filters), and that of the regularized particle filter with 
residual nudging, in a linear scalar model. This example 
is used to examine the effect of residual nudging on the 
performance of the regularized particle filter, in case that 
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the filter performance is reasonably good with a relatively 
large sample size. Section 5 examines and compares the 
performance of the above two filters with the L95 model in 
different scenarios. In some experiments the performance 
of the regularized particle filter may be less satisfactory, 
due to the effect of finite sample sizes. In such cases, 
we show that residual nudging can help to improve the 
filter performance in terms of filter accuracy and/or stability 
against divergence. Section 6 summarizes the whole work 
and discusses possible extensions of the present study. 

2. Particle filtering 

Consider the state estimation problem in the following 
system 

x k = M k ,k-i (Xfc-i) + 4 1 , (la) 
y k =U k {x k ) + e° k . (lb) 

Here, £ M. n is the n-dimensional model state at time 
instant k, y k £ W the corresponding observation of x k , 
£ m g ]]jn t j le moc j e i error with zero mean and covariance 
matrix Q k , and e° k £ W the observation noise with zero 
mean and covariance matrix R/;. The transition operator 
A4 k)k -\ : M" — > R" maps x. k -i to x k , and the observation 
operator Hk : M™ — > R p projects from the state space 
onto the observation space. The problem of our interest is 
to estimate the posterior pdf of the model state x k at time 
instant k, given the observations Y k = {y k ,y k ^i, • ■ • } up 
to and including k, together with the prior pdf p (x,|Yj_i) 
of the model state Xj at some earlier instant i (i < k). 
For convenience of discussion, we assume that p < n 
throughout this work. 

Recursive Bayesian estimation (RBE) (Arulampalam 
et al. 2002) provides a probabilistic framework that 
recursively solves the state estimation problem in terms of 
some conditional pdfs. Let p(xfc|Yfc_i) be the prior pdf 
of Xfc conditioned on the observations Yfc_i up to and 
including time k — 1, but without the knowledge of the 
observation yet. Once the observation y^ is known, 
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one incorporates the information content of according total number of particles (called sample size hereafter). For 

to Bayes' rule to update the prior pdf to the posterior one notational convenience, let x^_ x i be the particles generated 

p(xfe|Yfe). By evolving p(xk\Yk) forward in time, one by are-sampling algorithm, and Wk-i,i the corresponding 

obtains a prior pdf p (xfc + i|Yfc) at the next time instant, weights >. In consistency with Eqs. (2) and (3), the PF has 

Concretely, the mathematical description of RBE consists the following steps, 
of (Amlampalam et al. 2002): 



Prediction step: 

p(x fc |Y fc _i) = yp(xfc|x fc _i)p(x fc -i|Y fe _i)dx fc _i 



(2) 



and Filtering step: 



Prediction step: The particles x^_ 1 i are integrated 
forward with the model to obtain the propagations x^ i at 
the next time instant k. The associated weights of the new 
particles x b ki remain to be Wk-i,i- This is equivalent to 
using the transition pdfs p rxfe|x|_ 1 A as the importance 
density to generate the particles x^. One can also use 
other importance densities for this purpose, as discussed 



/ i \ / i-wt \ previously. For examples, see, Amlampalam et al. (2002); 

/ IY \ = p(yfc|xfc)p(x fc |Y fc _i) F 

/p(y fe |x fc )p(x fc |Y fc _i)dx fc ' 1 BocquetefaZ. (2010); Van Leeuwen (2010). 



, , t , \ j i i i u j Filtering step: With an incoming observation yt, the 

where the transition pdf plxk\x.k-i) and the likelihood - — ° 

. / I \ , , i- r particles remain unchanged, i.e., x? . = x£,-, while the 

function p(yfc|xfc) are assumed known, in light of the r ° K > % K > 1 

, , , „ , ,. .. r. , . , . , • associated weights - in light of the choice of the importance 

knowledge of the distributions of the model and observation ° r 

■ r- /i\ ^ i- - t c j » , density p (xt I x? , , ) - are updated according to Bayes' 

errors in Eq. (1). Once the explicit forms of the conditional J \ 1 '-- i >V r J 

pdfs in Eqs. (2) and (3) are obtained, the optimal estimate m ^ e S ° ^ at 

and other associated statistical information can be derived ~ i \ b \ 

based on a certain optimality criterion, e.g., minimum > l N ' 

£ P(yfcl x fc, 4 ) 

variance or maximum likelihood. Thus RBE provides a i=1 



where p(yfc|x| 4 ) is the probability that y^ happens to be 



solution of the estimation problem, and conceptually leads 

wnere p(y k \^ k 

to the optimal nonlinear filter. 

the observation with respect to x^ i . 

In practice, however, difficulties often arise in deriving 

. .. , GU , , , . e . . Re-sampling step: To overcome the problem of weight 
the exact optimal filter, largely due to the fact that the ±- — s. ±- F & 

i ■ c ™ j £ » t ui m. £ collapse, it is customary to introduce a re-sampling step 

integrals in Eqs. (2) and (3) are often intractable. Therefore r J r & r 

.... . • ... , to the PF when the effective sample size is below a 

one may have to adopt a certain approximation scheme F 

e , .. j .i nc \ k *■ r~* i • certain threshold, or alternatively, when the difference 

for evaluation. In the PF, Monte Carlo approximation is 

.... ■ . • , . • Ae c between the weight "entropy" and that with the uniform 

adopted to approximate the prior and posterior pdfs. For & ^ J 

■ . . ■ / i-vr i ... /, , x., . weight exceeds a certain threshold (see Appendix A). Many 

instance, the posteriory (xfc_i|Yfc_i) at the (fe — ljth step & v FF ' J 

. j , implementations of the PF distinguish each other mainly in 

is approximated by F fe J 

their re-sampling strategies. There is a rich literature in this 
respect. Readers are referred to, for example, Amlampalam 
et al. (2002); Van Leeuwen (2009) and the references 
therein on this issue. Here we only consider a re-sampling 



N 

p(x fc _i|Y fc _i) »^»)n,j(xn - x£_ M ) , 



where x|_ x i (i = 1,2, ■■■ ,N) are the particles at the 

filtering Step before a re-sampling algorithm (if necessary) f If there is in fact no need to conduct re-sampling, then x£_ M = xg_ 1 _. 

and u>k—i i = Wfe— l j. If re-sampling is conducted, then Wk—n = l/N, 

is applied, Wk-i,i are the associated weights, and N is the and i may be different from x^_ 1 1 . 
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strategy based on the kernel density estimation (KDE), 
which leads to the so-called regularized particle filter (RPF) 
(Doucet et al. 2001, ch. 12). In the RPF, one applies 
KDE to estimate a continuous pdf of the model state 
based on the particles x£ . and their associated weights 
Wk,i, and then uses this pdf to draw N new particles x£ i 
(i = 1,2, •• • ,N), which are then assigned the identical 
weight 1/N after re-sampling. More details of the RPF 
implemented in this work are provided in Appendix A. 

3. Residual nudging and its implementation in the 
particle filter 

3.1. Residual nudging 

For ease of discussion, we first define some notations. The 
weighted sample mean x£ of the (analysis) particles x^ i is 
given by 

JV 

= Wk ^h , (5) 
i=i 

and the corresponding residual is = 'Hfe(x^) — yJJ with 
respect to the observation y£ = Hfc(x|T) + i% at instant k, 
where x^T is the corresponding truth, and e£ a realization 
of the observation error. Define the weighted ^2-norm of a 
vector z as 

||z||w = ^(W)-iz, (6) 

where the normalization (or weight) matrix W is symmetric 
and positive definite. Throughout this work, W is chosen to 
be the co variance matrix R^, although there certainly exist 
other possibilities (also see the discussion in Section 3.2). 

Under the above setting, and by the triangle inequality, 
one has 

HrtK < \\H k (n) H k (x%)\\n k + \\e° k \\n k • (7) 

For a reasonably good estimate xJJ, we expect that in 
general H'Hfc(x^) — Hk(x t k)\\~R k should not substantially 
exceed the observation noise term ||e^.||R fc , which, 
in a certain sense, is connected to the number of 
independent observations (see the discussion below). 
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On the other hand, we have (E||e°|| R J 2 <JE||e£||^ fc = 
trace(E(£ ; °(e°) T )R A : 1 ) = trace(R fc R i : 1 ) =p, thus the 
expectation E||e£j|R fc of the weighted ^2-norm of the 
observation noise e° k is (at most) in the order of y/p. 
By requiring that a reasonably good estimate have 
||"Hfc(x£) — "Hfc(xJ. r )||R. fc also in the order of y/p (or even 
less), one comes to that ||f^||n, fc should be upper bounded 
by (3y/p, where /3 is a pre-chosen real positive scalar, called 
the noise level coefficient hereafter. The choice of j3 will be 
further discussed later. 

We introduce residual nudging to the PF after the filtering 
step, and before the re-sampling step. The objective in 
residual nudging is the following. We accept x£ as a 
reasonable estimate if its residual norm ||f^||R fc is no larger 
than the pre-specified value fiJp. Otherwise, we consider 
x£ a poor estimate, and thus look for a replacement, say, 
x£, based on the original estimate x£ and the observation 
y£, so that the new residual norm of x£ is no larger than 

In what follows we assume that the observation operator 
Hk is a linear operator (e.g., a matrix), denoted by 
Hfc hereafter. For nonlinear observation operators, the 
procedures in residual nudging become more complicated. 
One may, for instance, linearize Hk locally as in the 
extended Kalman filter, or adopt a numerical optimization 
algorithm to find a replacement estimate. Investigations of 
these possible strategies will be considered in future work. 

In case of linear observations, the objective in residual 
nudging can be achieved as follows. First of all, we 
construct a potentially new estimate x£ by letting 

x£ = c fc x£ + (l-Qfc)x£, (8) 

where c k £ [0, 1] is the fraction coefficient that will be 
calculated later, and x£, termed observation inversion in this 
work, is a solution of the equation 

HfeXfc = y° . (9) 
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Under the assumption that the observation dimension p is 
no larger than the system state dimension n, Eq. (9) may 
be an under-determined problem, i.e., the solutions of Eq. 
(9) may not be unique. Numerically, if Hfc is of moderate 
dimension, one may choose to directly compute a pseudo- 
inverse of Hfc (Luo and Hoteit 2012). On the other hand, 
if the dimension of Hfc is large and it is inconvenient to 
compute a pseudo-inverse in a straightforward way, then 
there are a few alternative ways to find an observation 
inversion x£. One is to conduct a QR decomposition on 
HjT (Luo and Hoteit 2012); another is to directly apply an 
iterative optimization algorithm (e.g., conjugate gradient) to 
the linear equation Eq. (9) (Engl et al. 2000; Nocedal and 
Wright 2006); and the third is to construct a merit function 
(Nocedal and Wright 2006), which recasts the problem of 
solving a linear equation as a least squares problem, as 
described below. 

To construct the merit function, we follow the custom 
in inverse problems (see, for example, Engl et al. 2000) 
and give preference to the solutions with relatively small 
magnitudes. Therefore we recast the problem of solving Eq. 

(9) as a weighted least squares problem, in the form of 

minj|H fc x fe -y£||^ + -||x fc ||^. (10) 
x a 

The second term in ( 1 0) represents a regularization term that 
sorts out a preferred solution from the many possible ones. 
There a is a non-negative scalar and Qk is the weight matrix 
associated with Xfc. 

The specific choice of the regularization term in 

(10) is recommended to use only for the situations in 
which one does not have further "prior knowledge" (e.g., 
physical constraints like variable bounds and/or dynamical 
balance) of the model state. In this specific context, 
the "prior knowledge" does not include the information 
from the available particles themselves, since it is already 
represented by the original estimate x^ in Eq. (8). With 
this argument, it is clear that the formulated least squares 
problem (10) only represents one - but by no means the 
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best - possible choice in finding an observation inversion. If 
one does have certain "prior knowledge" of the model state, 
then it would be more appropriate to re-formulate the least 
squares problem to better reflect the availability of these 
extra information sources, e.g., in the form of a constrained 
optimization problem (Nocedal and Wright 2006), or by 
constructing a special weight matrix Slj, that enhances the 
model balance of the state estimate. In general the formation 
of such a regularization problem may be case-dependent, 
and is thus not pursued in this study. 

In general, the least squares problem (10) can be solved in 
the framework of 3D-Var (Van Leeuwen 2010). Specifically, 
when the observation operator is linear, an explicit solution 
can be obtained as follows 

x° = (anOH^CH^an^H^ + Rfc)- 1 ^. (11) 

If one treats a as a covariance inflation factor, then Eq. 
(11) corresponds to a Kalman update scheme with the 
background mean and inflated covariance being and 
afifc, respectively (and the zero background mean is 
consistent with our solution preference in solving the under- 
determined linear equation). This point of view motivates us 
to take Ofc as the background sample covariance P| of the 
particles (with equal weights). However, in light of the fact 
that the least squares problem (10) requires Oj, to be of full 
rank, we follow the idea in the hybrid EnKF (Hamill and 
Snyder 2000) and choose Sl^ to be a hybrid of P b k and a 
background covariance B which can be obtained from, for 
example, a long-term run of the dynamical model (see the 
descriptions in the experiments later). More concretely, we 
let 

n k = 0.5 P^ + 0.5B, (12) 

throughout this work. On the other hand, as an approximate 
solution to Eq. (9), we are interested in obtaining an 
observation inversion x£ with a relatively small residual 
norm ||r£|| Rfc = ||H fc x£ - y° k \\n k . To this end, we choose 
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a relatively (very) large value for a. Specifically, we let subsequent procedures. In this regard, residual nudging may 

be considered as a technique that, when necessary, provides 
a = 10 10 x trace(R /c )/trace(H fc r2 fc Hj) (13) a correction to the mean of the empirical pdf. One might 

also come up with other higher-order moments correction 

in this work. schemes. 



N 

In the equation x£ = w k ,i ^ the weights ivk.i may 
depend on x£ i? for instance, by letting ijbk.i oc p(y£|x£ i ). 



Next we need to choose a proper fraction coefficient 

Cfc so that the residual norm with respect to the new 

estimate xj is no larger than B^/p. We consider two „, .. , 

K Therefore, in general one needs to solve n (scalar, under- 

possibilities: (a) the original residual norm II f? II r, < B./p. , . . , , T 

" K " * v determined) nonlinear equations with TV x n unknowns 

In this case, we do not introduce any change to the 



original estimate x^, and choose c k = 1 in Eq. (8); and 



whenever residual nudging is conducted, which may appear 

complicated and expensive in high dimensional systems. 

(b) the original residual norm ||?2||r, > B\/P- In this IT , ... . „. , 

" ( " * v Here we adopt a heuristic, yet simple strategy. We let the 

case, by applying Hfe to both sides of Eq. (8), we have 



the new residual rf, — H^x^ — y% = c k if. + (1 — Cfc) r£. 
By applying the triangle inequality again to the new 



weights associated with the new particles be Wk,i = Wk,i- 
In addition, we preserve the deviations to the analysis 
mean so that x^ i — x^ = x£ . — x^. Under this choice, we 



residual norm, it can be shown that a sufficient condition N 

have = X fe + ( X L ~ **). E v>k,i = x*> and the 

to guarantee ||r£||R fc < B\/P is to choose < (By/p — 

||^||Rj/(||^||R fe - H^HrJ. Throughout this work we weighted sample covariance X> M (x^ -x^)(x^ - 
choose c fc = (8^-\\r° k \\ Rk )/(\\r a k \\ Rk - ||r°|| Rfc ). One x^) T of x^ is equal to that of x^. By Eq. (8) we have 
may also take smaller values for Cfc, whose effect is then 

x a = 

equivalent to taking smaller p values. Combining the above k > 1 
two possibilities, one may re-write the choice of Ck in a 



+ (*g-x2)=x£ ii + (l-c fc )(x£-x2), 

(15) 



more compact form i e which imphes that the new set of particles is simply a spatial 

translation of the original one. 

. a 0yp-\K\\ Rk \ .... 

Ck = mm { ||rg||R fc -KI|R fc ) ' ( } Let y ^ = H ^„-' Ki = H.x^ and y% = K k i% be 

the projections of x^ f , x^ i and x^ onto the observation 

After obtaining the new analysis mean xJJ through the space, respectively, then y£ ■ = yJJ i + (1 - c fe )(y£ - y£). 

above procedures, in general one may need to find a new Therefore, when c k < 1 such that 1 - c k > (i.e., when the 

set of particles x£ ■ and the associated weights u> M , so residual norm ||r£|| Rfc exceeds the threshold B^/p), residual 

N 

that x^ = Wk.i %-ki- I n doing so, it is equivalent to nudging tends to move the particle projections toward the 

i=l 

making certain modifications to the empirical posterior observation yf. at the same assimilation cycle, with identical 

pdf in Eq. (3) of the PF, so that the modified empirical length and direction of movement in the observation space, 

posterior pdf may not be equivalent to the original one Through some experiments later, we show that the nudging 

any more. As we have discussed in Section 1, in certain strategy Eq. (15) improves the performance of the PF, in 

circumstances (e.g., when with a relatively small sample terms of filter accuracy and/or stability against divergence, 

size) the original empirical posterior pdf of the PF may be We note that it is also possible for one to introduce 

a poor approximation to the truth. In such circumstances it nudging terms to the particles through other strategies. For 

would appear reasonable to introduce a certain correction to instance, Van Leeuwen (2010); Ades and van Leeuwen 

the original empirical posterior pdf, instead of using it for (2012) introduce nudging terms to the particles and use 
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them as the samples of the chosen importance density. 
Comparison and combination of different possible nudging 
strategies will be considered elsewhere. 

After residual nudging is done, the re-sampling step (if 
any) is then conducted with respect to the new particles x£ { 
and the associated weights Wk.i- The subsequent prediction 
and filtering steps for the next data assimilation cycle are 
the same as in the normal particle filter, followed by another 
residual nudging step if necessary, and so on. 

3.2. Discussion 

It is worth to discuss what the differences may be if one 
replaces the covariance matrix R& by a general symmetric 
and positive definite matrix in (7). In that case, one 
also has an inequality reading (E||£fe||w fc ) 2 < ^l|£fellw fc = 
trace(E(£^(e^) T )W i : 1 ) = trace(R fc W A : 1 ). As a result, 
Eq. (14) becomes 

/ /3^trae e (R fe W^VKIIw fc \ 

c fc = mm 1, — — — — , 

I l|rg||w» - Kllw* I 

while the subsequent equations, e.g., Eqs. (8) and (15), 
remain unchanged. Therefore, for a given j3, the choice 
of Wfc only affects the value of the fraction coefficient 
Cfc, which in effect is equivalent to varying the noise level 
coefficient /3 given a fixed normalization matrix, say, R^. 
Therefore, in this work we do not investigate the effects of 
different normalization matrices Wfe. Instead, we examine 
the impact of j3 on residual nudging in some experiments 
later. 

Once a noise level coefficient j3 is chosen, we keep 
it constant over the whole assimilation time window. 
However, the corresponding fraction coefficient in 
Eq. (14) may vary from time to time, so that the new 
analysis x£ in Eq. (8) is an adaptive combination of the 
original analysis x£ and the observation inversion x£. 
Roughly speaking, the choice of /3 reflects the relative 
confidence of the filter designer in x^ and x£. A small 
j3 means that the filter designer tends to rely heavily on 
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x£, while a large /3 implies that the filter designer wants 
x£ to be dominant. These can be seen from Eqs. (14) and 
(8). Because Ck £ [0, 1], the new analysis x£ in Eq. (8) 
is a convex combination of x£ and x£, i.e., an estimate 
somewhere in-between the original estimate x£ and the 
observation inversion x|, depending on the value of Cfc. 
If one chooses a large value for /?, or, if for a fixed j3 
the original residual norm f£ is sufficiently small, then 
the fraction coefficient Cfc — > 1 according to Eq. (14), thus 
x£ — r x£ according to Eq. (8). Therefore x£ will be a good 
estimate if x^ is so (as will be further discussed later), but 
may not be able to achieve a good estimation accuracy when 
x£ itself is poor. On the other hand, if one chooses a very 
small value for j3, or, if for a fixed j3 the original residual 
norm f £ — > +oo (e.g., with filter divergence), then —r 0, 
x£ —7 x£, and ||f^||R fc — > 0. In this case, the estimate x£ is 
calculated mainly based on the information content of the 
observation y£, and may result in a relatively poor accuracy 
due to the existence of the observation noise £? in Eq. (lb), 
together with the ignorance of the prior information about 
the dynamical model. However, using x? as the estimate is 
often a relatively safe (although conservative) strategy, in 
that for a given observation y?, x£ tends to be less sensitive 
to the model error and the sample size (hence the effect of 
weight collapse). 

Our main objective in this study is to present residual 
nudging as a safeguard strategy for the PF in case that it 
does not perform well in certain circumstances, due to, for 
instance, the small sample size. However, it may still be of 
interest to gain some insights on the asymptotic behaviour 
of the PF with residual nudging. For instance, what happens 
if one introduces residual nudging to a PF which, with 
infinitely many particles, converges to the optimal solution. 
In such cases, the PF with residual nudging can have the 
same optimal solution as the normal PF, provided that f3 
is sufficiently large. This can be achieved by making the 
fraction coefficients Ck = 1 for all k, such that by Eqs. (8) 
and (15) residual nudging will have no effect on the original 
PF solution. To guarantee Ck = 1 for all k, a sufficient 
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condition is to make /?\/p/||rjfc||R. )! in Eq. (14) no less than 
1, which implies that f3 > max ||r£||R fc /-y/p. In this aspect, 
a more convenient strategy would be to make j3 adaptive 
with time, rather than pre-set it over the whole assimilation 
window. Given our main objective in this study, though, we 
do not consider the adaptive choice of /3. 

4. Numerical results in a linear scalar model 

First we investigate the performance of the RPF and 
RPF-RN in a scalar, first order autoregressive (AR1) 
model driven by Gaussian white noise. The motivation in 
conducting this experiment is the following. Due to the low 
dimensionality of the model, the estimate of the normal PF 
would approach the optimal one with a reasonably small 
sample size (in terms of computational cost). This provides 
a computationally convenient platform to investigate the 
behaviour of the PF with residual nudging when the normal 
PF is performing well. 

In the experiment the scalar AR1 model is given by 

x k +i = 0.9x k +E%, (16) 

where e™ represents the dynamical noise, which follows the 
Gaussian distribution with zero mean and variance 1, and is 
thus denoted by £™ ~ N(e k n : 0, 1). The observation model 
is described by 

yk=x k +s° k , (17) 

where e° k ~ N(e^ : 0, 1) is the observation noise, and is 
uncorrelated with e™. 

The filters adopted in this work are based on the 
regularized particle filter (RPF) (Doucet et al. 2001, ch. 
12, also see Appendix A). We compare the performance 
of the normal RPF and that of the RPF equipped with 
residual nudging (RPF-RN). In the experiment, we integrate 
the AR1 model forward for 10, 000 steps (integration steps 
hereafter), with the initial value xo randomly drawn from 
the Gaussian distribution N(Q, 1), and the associated initial 
prior variance being 1. The true states (truth) {a^}^ 9 
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are obtained by drawing samples of dynamical noise from 
the distribution N(0, 1), and adding them to x k to obtain 
Xk+i at the next integration step, and so on. The synthetic 
observations y% are obtained by adding to the model state Xk 
samples of observation noise from the distribution N(0, 1), 
and are assimilated into the AR1 model every 4 integration 
steps. The RPF has an ensemble of 1000 particles, which 
are initialized by drawing samples from 7V(0, 1). Except for 
the presence of residual nudging and the related procedures, 
the RPF-RN has the same experiment settings as the normal 
RPF. In the RPF-RN, the background covariance B in Eq. 
(12) is obtained by integrating the AR1 model forward for 
100, 000 steps and taking B as the temporal covariance of 
the corresponding trajectory. The noise level coefficient f3 
in the RPF-RN is taken from the set {0.02, 0.2, 1, 2 : 2 : 
20}, where the notation Vi : Sv : Vf means a set of values 
that grows from the initial value Vi to the final one Vf, 
with an even increment Sv each time. To reduce statistical 
fluctuations, we repeat the experiment 20 times, each time 
with randomly drawn xq, e™, e° k and the initial particles of 
the filters. 

We use the average root mean squared error (average 
RMSE) to measure the accuracy of a filter estimate. For 
an n-dimensional system, the RMSE e k of an estimate 
Xjt = [£k,i, •■■ , Xk. n ] T with respect to the true state vector 
x aT = [ x< ki^ ' ' ' ' xt kn\ T at tmie mstan t k is defined as 

e fc = ||x fe -x* fe li„A/^, (18) 

where I n denotes the n-dimensional identity matrix. The 
average RMSE e& at time instant k over M repetitions of 
the same experiment is thus defined as ik = YljLi e k l^ 
(M = 20 in our setting), where el denotes the RMSE 
at time instant k in the jth repetition of the experiment. 
We also define the time mean RMSE e as the average of 
ik over the assimilation time window with S integration 
steps, i.e., e = ^2^=$ &k/S (S = 10000 here). One may 
also adopt other metrics (e.g., a certain weighted norm as 
in Section 3), rather than the Euclidean norm in Eq. (18), 
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as the performance measure. Since it is customary to use 
the Euclidean norm in the literature, we stick to this choice 
hereafter. 

Fig. 1 shows the time mean RMSEs of the RPF and RPF- 
RN as functions of the noise level coefficient (3. Because 
of the different orders of magnitudes of (3 used in the 
experiment, the horizontal axis is plotted in the logarithmic 
scale. The time mean RMSE of the RPF is around 1.08*, 
independent of /?, therefore the corresponding curve is a 
horizontal line. For the RPF-RN, its performance depends 
on (3. Starting from (3 = 0.02, the time mean RMSE of the 
RPF-RN tends to decrease as f3 grows, until (3 reaches 10. 
After that, there are some slight fluctuations as (3 grows 
further. The behaviour of the RPF-RN is largely consistent 
with our discussion in Section 3.2. Indeed, when (3 is 
relatively small, the fraction coefficient Ck in Eq. (14) tends 
to be smaller, thus by Eq. (8) the observation inversion 
has a larger impact on the estimate of the RPF-RN, while 
the reasonably good original state estimate may be under- 
represented. As a result, the performance of the RPF-RN 
is relatively poor in comparison to the normal RPF. As /3 
increases, Ck approaches 1, hence the original state estimate 
becomes more influential, so that the performance of the 
RPF and RPF-RN becomes close to each other. 

Fig. 2 depicts the time series of the fraction coefficients of 
the RPF-RN. At f3 = 0.2 (upper panel), there is a significant 
number of Ck values that are relatively low, with 460 out of 
2500 Ck values being less than 0.5, and the mean value c 
of Ck being 0.8206. In contrast, at (3 = 2 (lower panel), the 
fraction coefficient tends to be larger. Only 9 out of 2500 
Ck values are less than 1, while the mean value c is 0.9997. 
In both cases, though, the mean values c are quite close to 
1, meaning that 1 — c are relatively small. Thus by Eqs. (8) 
and (15), the particles in the RPF and those in the RPF-RN 
may not be significantly different. This might explain why 
even with a small value of (3, say at (3 = 0.2, the time mean 

^For reference, the corresponding time mean RMSE of the Kalman filter is 
about f .06 (Luo and Hoteit 2012). 
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RMSE of the RPF-RN remains quite close to that of the 
RPF. 

The above results suggest that it may not be very 
meaningful to introduce residual nudging to the PF when 
it already performs reasonably well. However, in many data 
assimilation practices, the dimensionality of the problems 
is often very high. Thus it may be prohibitively expensive 
to run a PF with a sufficiently large sample size in 
order for the filter to achieve good perform. On the other 
hand, the PF may perform poorly when running only 
with a finite, relatively small, sample size. Through the 
experiments below, we show that in cases that the PF does 
not perform well, equipping the PF with residual nudging 
may (substantially) enhance the filter performance, in terms 
of filter accuracy and/or stability against divergence. 

5. Numerical results in the 40-dimensional L95 model 

5.1. Experiment settings 

We use the 40-dimensional L95 model (Lorenz and 
Emanuel 1998) as the testbed. The governing equations of 
the L95 model are given by 

dx ■ 

= (x i+1 - x l ^ 2 )x l -i -Xi +F, i = 1, •■■ ,40. 

at 

(19) 

The quadratic terms simulate advection, the linear term 
represents internal dissipation, and F acts as the external 
forcing term (Lorenz 1996). Throughout this work, we 
choose F — 8 unless otherwise stated. For consistency, we 
define ar_i = £39, .To = X40, and X41 = x\ in Eq. (19), and 
construct the state vector x = [xi,X2, ■ ■ ■ , X4o] T . 

We use the fourth-order Runge-Kutta method to integrate 
(and discretize) the system from time to 75, with a 
constant integration step of 0.05. To avoid the transition 
effect, we discard the trajectory between and 25, and 
use the rest (with overall 1000 integration steps) for data 
assimilation. The synthetic observation y> is obtained by 
measuring (with observation noise) every d elements of the 
state vector x^ = [xk,i,Xk,2, ■ • • , Xk,4o] T at time instant k 
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(k = !,-■■ ,1000), i.e., 

y k = H d x fc + e° k , (20) 

where H d is a ( J + 1) x 40 matrix such that H d x k = 
[x k .i,x k .i +d , ■ ■ ■ ,x k ,i+jd] T , with J = floor(39/rf) being 
the largest integer that is less than, or equal to, 39/d, and e£ 
is the observation noise following the Gaussian distribution 
N(e^ : 0, Ij+i), with Ij+i being the (J + 1) -dimensional 
identity matrix. The elements (H d ) pq of the matrix H d can 
be determined as follows. 

(U d ) pq = lifg=(p-l)d+l, otherwise {U d ) pq = , 

for p = 1, • • • , ( J + 1), g = 1, • • • , 40. In all the experi- 
ments below the observations are made for every 4 integra- 
tion steps unless otherwise stated. 

The filters in the experiments are configured as follows. 
To generate the initial particles, we run the L95 model from 
to 2500 (overall 50000 integration steps), and compute 
the temporal mean and covariance of the trajectory (the 
obtained temporal covariance is also used as the background 
covariance B in Eq. (12)). We then assume that the initial 
state vectors follow a Gaussian distribution with the same 
mean and covariance, and draw a specified number of 
samples as the initial particles. In many of the experiments, 
the sample sizes are relatively small so that the phenomenon 
of weight collapse is very severe. To mitigate this problem, 
we introduce a "jittering" procedure to the re-sampling step 
of the RPF following Gordon et al. (1993) (one may achieve 
a similar effect by increasing the bandwidth of the RPF). 
Concretely, after the re-sampling step of the RPF is finished, 
a random perturbation drawn from the Gaussian distribution 
7V(O,0.01 x I40) is added to each generated particle. Our 
experience shows that introducing "jittering" to the normal 
RPF improves the performance of the filter, especially in the 
case of small sample sizes. We note that the performance 
improvement of the RPF-RN over the normal RPF, as will 
be shown soon, does not depend on whether "jittering" 
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is introduced or not. Performance improvement similar to 
what will be presented below was also observed when no 
"jittering" was introduced (results not reported). 

To reduce statistical fluctuations, we repeat each 
experiment below for 20 times, each time with randomly 
drawn initial state vectors of the L95 model, initial particles 
and observations. Except for the introduction of residual 
nudging, in all experiments the RPF and RPF-RN have 
identical configurations and experiment settings. 

5.2. Experiment results 

5.2.1. Results with different observation operators 

Here we consider four different observation operators H d , 
with d = 1,2, 4, 8, respectively. For convenience, we refer 
to them as the full, 1/2, 1/4 and 1/8 observation scenarios, 
respectively. The concrete configurations of the RPF and the 
RPF-RN are the following. In both filters the sample sizes 
are fixed to be 20. In the RPF-RN, we let the noise level 
coefficient /3 <G {0.02, 0.2, 1,2:2: 20}. 

The time mean RMSEs (over 20 repetitions) of the 
normal RPF are 4.8389, 4.8963, 4.8966 and 4.9303 in the 
full, 1/2, 1/4 and 1/8 observation scenarios, respectively. 
This shows that as the number of assimilated observations 
decreases, the time mean RMSE of the RPF becomes larger. 

The time mean RMSE of the RPF-RN as a function of 
the noise level coefficient f3 is shown in Fig. 3 (dash-dotted 
lines marked with diamonds), in which, for references, the 
corresponding time mean RMSEs of the normal RPF are 
also plotted as solid horizontal lines (since they do not 
depend on (3). In the full observation case (upper left panel), 
when /3 is small, say /3 = 0.02, the time mean RMSE is 
close to 1. This is expected, since in this case, (3 — > 
implies that c k — > according to Eq. (14), and x£ — > y£ in 
the full observation scenario according to Eq. (8), whose 
time mean RMSE should thus be equal to 1, due to the 
fact that the observation error covariance is I40, and that 
the L95 model has no dynamical noise (or very little due 
to "jittering"). As j3 increases, the time mean RMSE of the 
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RPF-RN tends to decrease until f3 reaches 6, which achieves 
the minimum time mean RMSE 0.7789, substantially lower 
than the corresponding value 4.8389 in the RPF. Beyond 
that, continuing increasing j3 would deteriorate the filter 
performance instead. Overall, the estimate x£ of the RPF 
appears less informative than the observation inversion x£ 
in the sense that x£ yields larger time mean RMSE than x? 
(the estimate of the RPF-RN at a /3 ->• 0). 

The time mean RMSEs of the RPF-RN in the 1/2,1/4 and 
1/8 observation scenarios exhibit behaviours similar to that 
in the full observation scenario. They all tend to decrease as 
(3 grows from 0.02. However, for the 1/2 and 1/4 observation 
scenarios, they achieve their minimum time mean RMSEs 
around (3 = 20, while for the 1/8 observation scenario it 
is around f3 = 16. Clearly, in all these three scenarios, the 
time mean RMSEs with f3 —> are still (much) lower than 
those of the normal RPF, showing again that, in this specific 
context, the estimate of the RPF (with 20 particles) is 
less informative than the observation inversion. In addition, 
there are even larger gaps between the minimum time mean 
RMSEs of the RPF-RN and the corresponding RMSEs of 
the normal RPF. This shows that a proper choice of j3 can 
lead to further performance improvement (in terms of filter 
accuracy) of the RPF-RN, in contrast to just choosing the 
observation inversion as the estimate. 

The upper panel of Fig. 4 shows a sample time series 
of the RMSEs of the RPF and RPF-RN (/? = 2) in the 
1/2 observation scenario, and the lower panel indicates the 
time series of the corresponding fraction coefficient of the 
RPF-RN. The RMSEs of the RPF-RN are lower than those 
of the RPF for a large proportion of the assimilation time 
window, and the corresponding fraction coefficients of the 
RPF-RN tend to be relatively small. Only 13 out of the 
250 coefficients are larger than 0.1, and the mean value of 
these 250 coefficients is 0.0552, indicating that in Eq. (8), 
the relative weights of the observation inversions dominate 
those of the original estimates of the RPF. 

We also use the rank histogram of the true model 
state (truth hereafter) as a diagnostic tool to examine the 
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spread of the particles. Concretely, let Xk be a scalar 
that may be considered as an estimate of the true value 
at time instant k, and {ifc.j }^=i an ensemble of TV 
such estimates. Then the rank of the truth with 
respect to the set {xk,j}j—i is obtained by sorting the 
magnitudes of x^ and £kj (j = 1, • • ■ , TV) in ascending 
order. Collecting this information at every time step, one 
obtains a set of ranks {rk}^.Zo during the assimilation time 
window [0,5 — 1]. A rank histogram is thus a histogram 
that shows the distribution of (k = 0, • • • ,5 — 1) during 
the assimilation time window. Readers are referred to, 
for example, Hamill (2001), for more information of this 
graphical plot. In the context of particle filtering, roughly 
speaking, for a set of particles with reasonable variability, 
the corresponding rank histogram will be relatively flat, 
indicating that the truth is statistically indistinguishable 
from the particles. A U-shaped rank histogram normally 
indicates a spread deficiency in the particles, while a bell- 
shaped rank histogram indicates the opposite, i.e., over- 
estimated spread. 

In Fig. 5, we show the rank histograms of the first 
four elements, x t £ i (i = 1,2,3,4), of the truths xt r (fc 
1 . ■ ■ ■ , 1000) over the whole assimilation time window in 
the full observation scenario. The left column shows the 
rank histograms of the RPF with the sample size being 20, 
and the right column those of the corresponding RPF-RN (at 
/3 = 6). For all of the four elements, their rank histograms 
in the RPF are deeply U-shaped, with the truths mostly 
concentrating on the edges of the histograms, meaning that 
the particles in the RPF substantially under-represent the 
variability. In contrast, the rank histograms in the RPF- 
RN exhibit improvements in terms of flatness (although 
still deeply U-shaped), meaning that better variability 
representations are achieved in the RPF-RN, as a by-product 
of residual nudging. Similar rank-histogram improvements 
are also observed in other observation scenarios with 
different sample sizes, though in some cases they may not 
be as significant as those shown in Fig. 5. 
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5.2.2. Results with different sample sizes 

Here we examine the time mean RMSEs of the RPF 
and the RPF-RN as functions of the sample sizes. 
The experiment settings are the following. We conduct 
the experiments in the 1/2 observation scenario, in 
which the observation operator is H d , with d = 2. In 
both filters the sample sizes N are chosen from the 
set {1, 10, 20, 40, 60, 80, 100, 200, 400, 600, 800, 1000}. In 
the RPF-RN, we let the noise level coefficient (3 <E 
{1,5,10,15}. 

For reference, we also investigate the performance of the 
EnKF with perturbed observations (Burgers et al. 1998) 
under the same experiment settings. Covariance inflation 
(Anderson and Anderson 1999) is introduced to the EnKF 
for all sample sizes, and covariance localization (Hamill 
et al. 2001) is conducted only when the sample size N < 
100% following the results in Fig. 5 of Bocquetef al. (2010). 
For conciseness, in what follows we only present the best 
possible results of the EnKF among the filter configurations 
that we have tested. 

Fig. 6 shows how the time mean RMSEs of, (a) the 
EnKF, (b) the RPF, and (c) the RPF-RN, change with the 
sample size. For the EnKF (Fig. 6(a)), numerical results 
show that it diverges^ when the sample size N < 10. For 
this reason, we only present the results with the sample size 
N > 20. As shown in Fig. 6(a), at N = 20, the time mean 
RMSE of the EnKF is 4.4658. As the sample size increases, 
the corresponding time mean RMSE drops rapidly until N 
reaches 80. After than, the time mean RMSE of the EnKF 
seems to enter a plateau, with the time mean RMSE being 
around 0.73 and insensitive to the increase of N. 

For the RPF (Fig. 6(b)), when with only 1 particle, its 
time mean RMSE is 5.1387. As the sample size increases, 

^Concretely, we follow the procedures in Luo et al. (2010a,b) to conduct 
covariance localization, in which a parameter l c , called length scale, is 
involved in order to control the range of cut-off (Hamill et al. 2001). On 
the other hand, covariance inflation is conducted by inflating a covariance 
matrix by a multiplicative factor (1 + S) 2 . In the experiment we let 5 6 
{0 : 0.01 : 0.06} and l c G {10 : 20 : 150}. 

^Here a divergence is referred to as an event in which the RMSE of a filter 
at a certain time instant is larger than 10 3 . 
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the time mean RMSE in general tends to decrease, though 
there are also certain statistical fluctuations. With the 
sample size growing to 1000, the time mean RMSE of the 
RPF reduces to 4.3195. 

Fig. 6(c) shows the corresponding time mean RMSEs of 
the RPF-RN with different j3. The following phenomena 
are observed. (1) For a fixed sample size, the time mean 
RMSE decreases as (3 increases from 1 to 15; (2) The RPF- 
RN with different f3 exhibits similar response to the change 
of the sample size. When the sample size is lower than 
100, the time mean RMSEs of the RPF-RN follow a U- 
turn behaviour, achieving the minimum values somewhere 
between sample size 1 and 100. For sample sizes larger than 
100, the time mean RMSEs of the RPF-RN also seem to 
follow a U-turn behaviour, achieving their minimum values 
around sample size 600. A possible explanation of these 
phenomena is that changing the sample size has an effect 
on the residual norm ||r£||R fc . This in effect is equivalent to 
changing the j3 value in Eq. (14) with a fixed residual norm 
||r^||R fe , and may thus cause the U-turn behaviour, as have 
already been observed in Fig. 3. 

Comparing Figs. 6(b) and 6(c), it is clear that, with 
the above specific experiment settings, the RPF-RN with 
j3 E {1,5,10,15} systematically outperforms the RPF in 
terms of time mean RMSE. Even with the sample size 
of 1000, the estimate of the RPF is still less informative 
than the observation inversion (cf. the time mean RMSE at 
f3 = 0.02 in the upper right panel of Fig. 3). As a result, 
in Fig. 6(c) one can see that the estimate of the RPF-RN 
with only 1 particle is still (much) better than that of the 
RPF with 1000 particles. This shows that it is possible, 
in certain circumstances, for the RPF-RN with a relatively 
small sample size to achieve better filter performance than 
that of the normal RPF with a substantially larger sample 
size, similar to the result reported in Van Leeuwen (2010). 
This conclusion, however, should only be interpreted in 
conjunction with the above experiment settings. 

Finally, a comparison between the RPF-RN and the 
EnKF shows that, when the sample size is relatively small, 
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say, N < 40, the RPF-RN tends to outperform the EnKF. 
With a larger sample size, though, the EnKF may perform 
much better than the RPF-RN instead. We stress that the 
conclusion that the RPF-RN performs better than the EnKF 
(with perturbed observation) for a relatively small ensemble 
size may depend on the experiment setting. For instance, if 
one replaces the EnKF by the ensemble adjustment Kalman 
filter (EAKF) (Anderson 2001), then with the sample size 
20 the EAKF may outperform the RPF-RN instead (see, 
for example, the numerical results in Luo and Hoteit 2012). 
A related question is then when it is recommended to use 
the particle filter with residual nudging (PF-RN), instead 
of the EnKF. In our opinion, advantages in using the PF- 
RN may include that its performance appears more robust 
with relatively small sample sizes (in this aspect one may 
wish to compare the numerical results in different scenarios 
that are presented in this study and those in Luo and 
Hoteit 2012), and that there is no need to tune the intrinsic 
filter parameters in the EnKF, i.e., the covariance inflation 
factor and the length scale of covariance localization. As 
shown in Luo and Hoteit (2012), in certain circumstances 
the EnKF may diverge for some combinations of the 
covariance inflation factor and the length scale of covariance 
localization. Therefore, in practice if one is only able to 
afford a small sample size, it might be worth to run a PF- 
RN first, and then, if possible (and desirable), use the PF- 
RN estimate as the baseline to see if it would be better to 
use the EnKF instead. On the other hand, we envision that 
there is still space of improvement for the PF-RN in the 
future. One possibility is to equip the PF-RN with a better 
importance density to further mitigate the effect of weight 
degeneracy, which may be done by, for instance, combining 
the equal-weight particle filter (Van Leeuwen 2010; Ades 
and van Leeuwen 2012) with residual nudging. 

5.2.3. Results with different assimilation frequencies and 
observation noise covariances 

Here we examine the effects of the assimilation frequency 
and the observation noise covariance matrix on the 
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performance of the RPF and RPF-RN. To this end, 
we vary the assimilation frequency, and choose to 
assimilate the observations for every S a step(s), with 
S a S {1, 2, 4, 6, 8, 10, 12}. For convenience, we call S a the 
assimilation step when it causes no confusion. To examine 
the effect of the observation noise covariance matrix, we 
assume that the covariance matrix Rfc is of the form 7I, 
where I is the identity matrix with a suitable dimension, 
and 7 a positive scalar. As a result, the variances are 7 
for all measurements in an observation vector, while the 
cross-variances are all zero. In the experiment we choose 
the variance 7 from the set {0.01, 0.1,1, 10}. The relatively 
large value of 7 at 10 is used to represent the scenario 
in which the quality of the observations is relatively poor. 
Here we assume that we know 7 precisely, while for the 
experiment in the next sub-section (Section 5.2.4), we 
will consider the case in which 7 is mis-specified. In the 
experiment we consider both the 1/2 and 1/40 observation 
scenarios. In the latter case only the first element of the 
model state is observed (equivalent to setting d = 40 in 
Eq. (20)), a scenario in which the filters may be subject 
to divergences. Other experiment settings are as following. 
The sample size N is 20 for both the RPF and the RPF-RN 
(unless otherwise mentioned). In the RPF-RN we set f3 = 
0.02, which is a relatively small value chosen to enhance 
the stability of the RPF-RN (see the discussion in Section 
3.2). 

Fig. 7 reports the performance of the RPF with different 
S a and 7 in the 1/2 observation scenario (solid lines with 
asterisks). When 7 is relatively small, say 7 = 0.01,0.1 
and 1 (upper left, upper right, and lower left panels, 
respectively), the time mean RMSE of the RPF is the 
smallest at S a = 1, and tends to increase as S a grows. 
For a sufficiently large S a (say at S a = 6), though, further 
increasing S a does not significantly change the performance 
of the filter. Interestingly, at 7 = 10 (lower right panel), the 
RPF with a larger S a tends to have better performance than 
the RPF with a smaller S a ■ In addition, when S a is relatively 
large, say S a = 12, the RPF at 7 = 10 performs better than 
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the RPF at other smaller 7 values. For comparison, we 
also show the corresponding performance of the RPF-RN 
(f3 = 0.02) in the same figure (dash lines with diamonds). 
For a fixed 7, the time mean RMSE of the RPF-RN is 
slightly U-shaped as S a changes, and it tends to achieve its 
minimum with S a > 1. On the other hand, for a fixed S a , 
the time mean RMSE of the RPF-RN tends to increase as 7 
increases. In all the tested cases, the time mean RMSEs of 
the RPF-RN are lower than the corresponding values of the 
normal RPF. 

It might not be consistent with our intuition to see 
that a PF with a larger assimilation step S a and worse 
observation quality (in the sense of having a larger 7) has 
better performance. In our opinion, though, this might be 
explained from the point of view of the effects of S a and 
7 on the effective sample size (ESS). Let {x-k,i}iLi be 
the set of particles that are associated with the weights 
{wk.i]f=i after applying the weight update formula Eq. 
(4), but before conducting re-sampling (if any). The ESS, 

N 

defined as l/( J (Liu and Chen 1995), can be used as 

i=l 

an indicator of the degree of weight collapse at time instant 
k in a particle filter. One can also define time mean ESS in 
a way similar to that in defining the time mean RMSE (see 
the text below Eq. (18)). 

In terms of time mean ESS and information contents of 
incoming observations, large S a and 7 have both positive 
and negative effects on the filter performance. A relatively 
large assimilation step S a means that there are more model 
integration steps in between two successive observations. 
For the relatively small sample size N = 20, re-sampling 
is often performed, after which the re-sampled particles 
have uniform weights. These uniform weights are then 
carried to the subsequent model integration steps, until 
they are updated with the next incoming observation. As 
a result, a larger assimilation step S a implies that, on 
the one hand, the time mean ESS of the particle filter 
tends to be larger, while on the other, less information 
contents of the observations are assimilated. Similarly, a 
larger 7 tends to make the weights of the particles more 
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uniform, which in turn increases the time mean ESS. 
On the other hand, though, observations with a larger 
7 contain more uncertainties and less information about 
the underlying model state. Therefore in our opinion, the 
reported behaviour of both filters in Fig . 7 may largely result 
from the combined positive and negative effects in choosing 
S a and 7. For verification, in Table I we show the time 
mean ESS of both the RPF and RPF-RN (/3 = 0.02) with 
different combinations of S a and 7 in the 1/2 observation 
scenario. As can be seen there, the time mean ESS of both 
filters indeed tend to increase as S a and/or 7 increase(s). 

Fig. 8 shows the performance of the RPF (solid lines 
with asterisks) and RPF-RN (dash lines with diamonds) in 
the 1/40 observation scenario. Both filters exhibit behaviour 
similar to that in Fig. 7, e.g., the filters may have better 
performance with a larger S a and/or 7 (when there is no 
filter divergence). An examination of the time mean ESS of 
both filters shows that they are close to the time mean ESS 
reported in Table I, and are thus not shown for conciseness. 

Compared with Fig. 7, there are also a few differences in 
Fig. 8. One is that, unlike the situation in the 1/2 observation 
scenario, filter divergences are spotted in both the RPF 
and RPF-RN in certain circumstances. Accordingly, when 
a filter divergence is spotted, there will be no RMSE value 
plotted in the corresponding place in Fig. 8. Following this 
setting, for instance, the upper right panel (with 7 = 0.1) 
of Fig. 8 shows that the normal RPF diverges at S a = 1, 2 
and 4, while the RPF-RN diverges at S a = 1 only. In terms 
of stability against divergence, the results in Fig. 8 show 
that the RPF-RN (/3 = 0.02) tends to be more stable than 
the normal RPF at different 7 values. In addition, when 7 
is relatively small, say, 7 = 0.01, 0.1 and 1, the RPF-RN 
still tends to perform better than the normal RPF in terms 
of filter accuracy. However, at 7 = 10 and with S a = 8, 
10 and 12, the time mean RMSEs of the RPF-RN become 
larger than those of the RPF. This is largely because in such 
cases the RPF achieves reasonable performance, while with 
a relatively small value j3 = 0.02, the RPF-RN tends to rely 
excessively on the observation inversion (cf. Eqs. (14) and 
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(8)). The RPF-RN can also have time mean RMSEs that are 
very close to those of the normal RPF by increasing f3 to, for 
instance, 10. This choice, however, may make the RPF-RN 
less stable at smaller S a values. This serves as an example to 
show that (3 has an impact on the trade-off between a filter's 
potential accuracy and stability. 

An additional remark regarding the relatively superior 
performance of the normal RPF in the lower right panel of 
Fig. 8 is the following. The effective dimension of the L95 
model, in terms of the "Kaplan- York" dimension (Ruelle 
1989), is about 27.1 (Lorenz and Emanuel 1998), while the 
time mean ESS of the normal RPF at S a = 8, 10 and 12 are 
around 18, not quite far away from the fractal dimension. 
In such cases, the subspace spanned by the particles of the 
normal RPF may capture the state space features of the 
L95 model reasonably well. As a result, in this specific 
context, the information contents of the observations may 
not be very influential on the estimation accuracy of the 
normal RPF (but may still be useful in terms of filter 
stability against divergence). If the subspace spanned by the 
particles is a less proper representation of the state space, 
we expect that the information contents of the observations 
may become more important to the filter performance. To 
this end, we conduct one more experiment in the 1/40 
observation scenario, in which we let S a = 12 and 7 = 
10, but reduce the sample size N of the normal RPF to 
N = 5. For comparison, we also examine the performance 
of the RPF-RN with different noise level coefficients /3 £ 
{0.02 : 0.02 : 0.1, 0.2 : 0.2 : 1, 2, 3, 4, 6, 8}. The time mean 
RMSEs of the RPF (solid horizontal line) and RPF-RN 
(dash-dotted line with diamonds), as functions of (3, are 
shown in Fig. 9. Compared with the lower right panel of 
Fig. 8, the performance of both filters deteriorates. However, 
in all the tested cases the RPF-RN performs better than 
the normal RPF. In addition, the RPF-RN tends to have a 
lower time mean RMSE with a smaller (3, meaning that the 
RPF-RN has better performance when it relies more on the 
observations in residual nudging. 
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5.2.4. Results with inaccurately specified models and 
observation systems 

Finally we examine the performance of the RPF and 
the RPF-RN in the presence of errors in specifying 
the dynamical model and the observation system. For 
convenience of discussion, we confine ourselves to the 
1/2 and 1/40 observation scenarios. In the 1/2 observation 
scenario, we assume that in the experiments the forcing 
term F in Eq. (19) and the observation error co variance 
Rfe are possibly mis-specified. The true value of F is 8, 
and the true observation error covariance R& is I20. In 
the experiments we let the value of F be chosen from the 
set {4, 6, 8, 10, 12}, and be in the form of 7I20, with 
the observation noise variance 7 <G {0.25, 0.5, 1, 2, 5, 10}. 
Note that in the RPF-RN, R^ is not only used to update 
the weights of the particles as in Eq. (4), but also used 
to compute the fraction coefficient in residual nudging 
(cf. Eq. (14)). We let the sample size N — 20 and the 
assimilation step S a = 4 in both filters, and the noise level 
coefficient ft = 1 in the RPF-RN. 

Fig. 10 shows the contour plot of the time mean RMSE 
of the RPF, with respect to the values of the forcing term F 
and the observation noise variance 7, in the 1/2 observation 
scenario. For a fixed 7, the time mean RMSE of the RPF 
tends to increase as F grows. On the other hand, for a 
fixed F, the time mean RMSE tends to decrease as 7 grows 
(which may also be explained based on the arguments in 
Section 5.2.3), with the decrement rates becoming smaller 
at larger F values. 

For comparison, Fig. 11 depicts the corresponding 
contour plot of the time mean RMSE of the RPF-RN in 
the 1/2 observation scenario. There appears to be a "sink" 
around the point (F =10,7= 10). Along a fixed direction, 
the further away from the sink, the larger the time mean 
RMSE tends to be. Comparing Figs. 10 and 1 1, one can see 
that the RPF-RN again outperforms the RPF in all tested 
cases. In fact, even the largest time mean RMSE of the RPF- 
RN (around the lower left corner of Fig. 11) is still lower 
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than the best time mean RMSE of the RPF (around the lower 
right corner of Fig. 10). 

The experiment settings of the 1/40 observation scenarios 
are almost the same as those in the 1/2 observation scenario, 
except that the assimilation step S a of both filters becomes 
12. With fewer measurements in an observation vector, 
filter divergences are also spotted in some cases. Therefore, 
instead of presenting the contour plots, we choose to 
directly report the time mean RMSEs of both filters in 
Table II, in which filter divergences are marked by "Div" 
in relevant places. The results there show that, for a fixed 
F, the time mean RMSEs of both filters tend to decrease 
as 7 grows. On the other hand, for a fixed 7, the time 
mean RMSE of the RPF tends to grow with F when 7 is 
relatively small (say, at 7 = 0.25), and exhibits slightly U- 
shaped behaviour when 7 is relatively large (say, at 7 = 10). 
The time mean RMSE of the RPF-RN also exhibits similar 
behaviour. Overall, the RPF-RN (j3 = 1) tends to perform 
better than the RPF, although compared to the results in 
Figs. 10 and 11, the gap between the RPF and RPF-RN 
(/3 = 1) is clearly narrowed. Both filters diverge in all cases 
with F = 12, and two other cases at (F = 10, 7 = 0.5) and 
(F = 10, 7 = 1). However, numerical experiments (results 
not reported) show that in this case one can improve the 
stability of the RPF-RN by reducing j3 to some smaller 
value, say, 0.02. 

6. Conclusion 

In this work we considered an observation-space based 
auxiliary technique, called residual nudging, to enhance 
the performance of the particle filter. The main idea of 
residual nudging is to monitor, and if necessary, adjust 
the residual norm of a state estimate so that it does not 
exceed a pre-specified threshold. We suggested a rule to 
choose the threshold, and proposed a method to do the 
possible adjustment in case of linear observations. For 
demonstration, we used the regularized particle filter (RPF) 
to conduct data assimilation in an AR1 model and the 
40-dimensional Lorenz 95 model. The experiment results 
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showed that the RPF with residual nudging (RPF-RN) 
outperforms the normal RPF in terms of filter accuracy 
and/or stability again divergence, especially when the 
normal RPF performs poorly. 

A problem that is not fully addressed in this work is the 
nonlinearity in the observation operator. We envision that 
residual nudging would be still applicable, with the same 
rationale in choosing the pre-specified threshold as 
discussed in the text below Eq. (7). When the observation 
operator Hk is continuous with respect to the model 
state, and there exists an observation inversion x£ such 
that Hk{x±) = Yfe, then the objective of residual nudging 
can be achieved, i.e., there exists a fraction coefficient 
Cfc G [0, 1] such that the residual norm of the estimate xJJ 
obtained through Eq. (8) is no larger than With 
nonlinearity, though, it may become more complicated 
in finding the estimate x£. While possible strategies in 
handling nonlinearity were mentioned in Section 3.1, how 
to implement them in numerically efficient ways will be 
investigated in the future. 
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A. Outline of the regularized particle filter 

Instead of using importance re-sampling as in the 
conventional bootstrap particle filter (Gordon et al. 1993), 
the RPF employs an alternative way to tackle the problem 
of particle degeneracy based on kernel density estimation 
(KDE, see, for example, Silverman 1986). The idea is to 
construct a continuous pdf based on the original particles 
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and their associated weights. This continuous pdf is treated 
as an approximation of the underlying pdf of the true model 
state, and is used to draw a new set of particles that is 
different from the original one almost surely (Doucet et al. 
2001, ch. 12). 

For illustration, suppose that at the /c-th assimilation cycle 
there is a set of N (original) particles {*-k,i}iLi> together 
with the corresponding weights {wk,i}fLi- As a resu lt, 



S k = 



J-f—J [\ /w kA( x k.l - Xfe), • • • , y/W k ,N(Xk,N ~ Xfc)] 

(A.l) 



is a square root of the weighted sample covariance with 
respect to the particles {x^}^, where 



N 

Xfe = ^ w k,i X M 
1=1 



(A.2) 



is the weighted sample mean. 

The continuous pdf to be constructed is then expressed in 
the form of (Doucet et al. 2001, ch. 12) 



JV 



p(x fe ) = ^ w k,iK 



i=l 



Xfe - Xfe.. 



(A.3) 



where K(») is the kernel function and h is a scalar 
parameter called the bandwidth (Silverman 1986). For the 
RPF implemented in this study, we use the Gaussian kernel 
and choose the bandwidth h according to the following rule 
(cf. Doucet et al. 2001, Eq. (12.2.7)): 



A = ( )V("+4) 
y n + 2' 



(A.4a) 
(A.4b) 



where n is the dimension of Xfe. 

The main procedures of the RPF implemented in this 
study are summarized below, largely following the style in 
Arulampalam et al. (2002, Algorithm 6). 

• Prediction step: FOR i = 1 to N 

Draw a prior sample i from the transition pdf 
p \^k\^k-i,ij' w ^ assign the weight Wk-i,i of 
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x &_i i to x fe %■ I n particular, if there is no dynamical 
noise, then x b ki = Mk,k-i{^l-i ti ), with -A/ffe^-i 
being the transition operator (cf Section 2). 
END FOR 

• Filtering step: FOR i = 1 to N 
Multiply the weight u>k-i,i of x^ . by the likelihood 

p(y° k Ki) 

END FOR 

Apply Eq. (4) to obtain the normalized weights 

• Re-sampling step: 

- Evaluate the difference 5k between the weight 

N 

"entropy" — w k,i log(wfe.i) and that with the 

i=l 

uniform weight 1/N, namely, 5k = log N + 

N 

E w k ,i log( W ;fe, i )(Pham2001) 

- IF 5k < 0.25 

No need to re-sample. Set x£ i = x| t and the 
associated weight Wk.i = Wk.i 
ELSE 
FOR i = 1 to N 
D> Draw a sample x from the set 
{x-l :i ,WkAiLi through importance 
re-sampling, as in the bootstrap particle 
filter 

D> Draw a sample rj from the Gaussian pdf 
N(0J N ) 

> Set x£ i = x + hSki] and the associated 

weight life, i = 1/N 
D> If desirable, introduce some additional 
"jittering" to x^ i 
END FOR 
END IF 
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Table I. Time mean effective sample sizes (ESS) of the normal RPF and RPF-RN (J5 = 0.02) with different assimilation steps S a and observation 
noise variances 7 in the 1/2 observation scenario. 
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Table II. Time mean RMSEs of the normal RPF and RPF-RN (/3 = 1) with (possibly) mis-specified forcing terms F and the observation noise 
variances 7 in the 1/40 observation scenario. 
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Figure 1. Time mean RMSEs of the RPF and RPF-RN as functions of the noise level coefficient /3, 
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(a) Time series of the fraction coefficient of the RPF-RN at /3 = 0.2 
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Figure 2. Time series of the fraction coefficients of the RPF-RN. Panel (a): /3 = 0.2; Panel (b): /3 = 2. 
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Figure 3. Time mean RMSEs of the RPF-RN as functions of the noise level coefficient f3 in different observation scenarios (dash-dotted lines with 
diamonds). For references, the corresponding time mean RMSEs of the normal RPF are also provided (solid horizontal lines). 
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Figure 4. Upper panel: A sample time series of the RMSEs of the RPF and RPF-RN (/? = 2) in the 1/2 observation scenario; Lower Panel: 
Corresponding time series of the fraction coefficient of the RPF-RN (J3 = 2). 
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Figure 5. Rank histograms of the first four elements of the particles with respect to the truths in the RPF and the RPF-RN (with = 6) in the full 
observation scenario. 
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Figure 6. Time mean RMSEs of (a) the EnKF; (b) the RPF; and (c) the RPF-RN, as functions of the sample size in the 1/2 observation scenario. Note 
that in the EnKF, filter divergence is spotted with sample size N = 1 and N = 10 so that the results of the EnKF are reported from N = 20. 
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Figure 7. Time mean RMSEs of the RPF and RPF-RN with different assimilation steps and observation noise variances in the 1/2 observation scenario. 
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Figure 8. As in Fig. 7, but it is now in the 1/40 observation scenario. 
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Figure 9. Time mean RMSEs of the RPF and RPF-RN in the 1/40 observation scenario. The experiment settings are: the sample size N ■ 
assimilation step S a = 12, and the observation noise variance 7 = 10. In the RPF-RN e {0.02 : 0.02 : 0.1, 0.2 : 0.2 : 1, 2, 3, 4, 6, 8}. 



5, the 



Copyright © 2012 Royal Meteorological Society 
Prepared using qjrms4.cls 



Q. J. R. Meteorol. Soc. 00: 1-?? (2012) 



32 



X. Luo and I. Hoteit 




Figure 10. Time mean RMSE of the RPF as a function of the driving term F and the observation noise variance 7 in the 1/2 observation scenario. 
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Figure 11. Time mean RMSE of the RPF-RN (with /3 = 1) as a function of the driving term F and the observation noise variance 7 in the 1/2 
observation scenario. 
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